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Abstract 

The typical view in evolutionary biology is that mutation rates are minimised. 
Contrary to that view, studies in combinatorial optimisation and search have 
shown a clear advantage of using variable mutation rates as a control param- 
eter to optimise the performance of evolutionary algorithms. Ronald Fisher's 
work is the basis of much biological theory in this area. He used Euclidean 
geometry of continuous, infinite phenotypic spaces to study the relation be- 
tween mutation size and expected fitness of the offspring. Here we develop a 
general theory of optimal mutation rate control that is based on the alterna- 
tive geometry of discrete and finite spaces of DNA sequences. We define the 
monotonic properties of fitness landscapes, which allows us to relate fitness 
to the topology of genotypes and mutation size. First, we consider the case 
of a perfectly monotonic fitness landscape, in which the optimal mutation 
rate control functions can be derived exactly or approximately depending on 
additional constraints of the problem. Then we consider the general case of 
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non-monotonic landscapes. We use the ideas of local and weak monotonic- 
ity to show that optimal mutation rate control functions exist in any such 
landscape and that they resemble control functions in a monotonic landscape 
at least in some neighbourhood of a fitness maximum. Generally, optimal 
mutation rates increase when fitness decreases, and the increase of mutation 
rate is more rapid in landscapes that are less monotonic (more rugged). We 
demonstrate these relationships by obtaining and analysing approximately 
optimal mutation rate control functions in 115 complete landscapes of bind- 
ing scores between DNA sequences and transcription factors. We discuss the 
relevance of these findings to living organisms, including the phenomenon of 
stress-induced mutagenesis. 

Keywords: Adaptation, Fitness landscape, Mutation rate, Population 
Genetics, Phenotypic Plasticity 
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1. Introduction 

Mutation is one of the most important biological processes that influence 
evolutionary dynamics. During replication mutation leads to a loss of infor- 
mation between the offspring and its parent, but it also allows the offspring 
to acquire new features. These features are likely to be deleterious, but have 
the potential to be beneficial for adaptation. Thus mutation can be seen as 
a process of innovation, which is particularly important as the number of all 
living organisms is tiny relative to the number of all possible organisms. A 
question that naturally arises with regards to mutation is whether there is 
an optimal balance between the amount of information lost and potential 
fitness gained. 

The seminal mathematical work to investigate biological mutation is by 
Ronald Fisher p], who considered mutation as a random motion in Euclidean 
space, the points of which are vectors representing collections of phenotypic 
traits of organisms. Using the geometry of Euclidean space, Fisher showed 
that probability of adaptation decreases exponentially as a function of mu- 
tation size (defined using the ratio of mutation radius and distance to the 
optimum), and concluded therefore that adaptation is more likely to occur 
by small mutations. Several studies, however, suggested that large muta- 
tions can be quite frequent in nature, thereby prompting re-examination of 
the theory [2j. Thus, Kimura [3] extended the theory to take into account 
differences in probabilities of fixation for mutations of small and large size. 
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Subsequently Orr [I] considered the effect of mutation across several replica- 
tions. Interestingly, while he had a critical role in developing mathematical 
theory around discrete alleles, Fisher in his geometric model uses Euclidean 
space, which is uncountably infinite and unbounded. That this is an impor- 
tant issue became apparent only after the realisation that biological evolution 
occurs in a countable or even finite space of discrete molecular sequences [5] . 
However, subsequent geometric models based on Fisher's, while they have 
explicitly modelled discrete mutational steps (e.g. [6]), continue to assume 
that they occur within the same infinite Euclidean space. This issue may 
contribute to the fact that the predictions of such models have at best only 
been partially verified in actual biological systems [7J [SJ EJ ED] • One of the 
contributions of the current work is that we consider mutation using the ge- 
ometry of other spaces, and in particular the geometry of a Hamming space, 
which is finite and leads to a radically different view about the role of large 
mutations. 

Mutation size as considered by Fisher is closely related to mutation fre- 
quency measured in biology in terms of the number of mutations per repli- 
cation per DNA base. Mutation rates in biology vary over several orders 
of magnitude [UJ. Nonetheless, mutation rate for any particular species is 
typically believed to be minimised, within bounds set by physiology [12], or 
more likely population genetics |13j . Despite this, mutation rates are known 
to vary within and among populations of a single species [2] and recently, 
population-genetic models have been developed proposing that variable mu- 
tation rates may be in fact adaptive in biology [T5] . 

Independent of such biological concerns, researchers in evolutionary com- 
putation and operations research have a longer history of considering variable 
mutation rates in genetic algorithms (GAs) (e.g. see [HI [T71 [TBI HS1 ED] for 
reviews). In particular, Ackley suggested in [2T] that mutation probability 
is analogous to temperature in simulated annealing, which decreases with 
time through optimisation. A gradual reduction of mutation rate was also 
proposed by Fogarty [22]- In a pioneering work, Yanagiya [23] used Markov 
chain analysis of GAs to show that a sequence of optimal mutation rates 
maximising the probability of obtaining global solution exists in any prob- 
lem. A significant contribution to the field was made by Thomas Back [24], 
who studied the probability of adaptation in the space of binary sequences 
and suggested that mutation rate should depend on fitness values rather than 
time. More recently, numerical methods have been used to optimise a muta- 
tion operator [20] that was based on the Markov chain model of GA by Nix 
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and Vose [22] • The complexity of this model, however, restricted the appli- 
cation of this method to small spaces and populations. It is these insights 
regarding mutation rate variation from evolutionary computation and opera- 
tions research which we develop here towards the particular issues presented 
by biological systems. 

We develop theory in the following directions: 

1. Generalise Fisher's geometric model of adaptation for metric spaces, 
and in particular for discrete spaces of sequences, such as the Hamming 
spaces with arbitrary alphabets. 

2. Define problems of optimal mutation rate control within such spaces, 
and study how different problem formulations (e.g. time horizon, ob- 
jective function) affect the solutions. 

3. Extend the theory to more biologically realistic (i.e. rugged) fitness 
landscapes. 

Some relevant results have already been reported. For example, results 
for general Hamming spaces were first reported in [261 127] . We develop these 
results towards biology in Section [2} Various optimisation problems were 
considered in [2H1 EH], deriving theoretical optimal mutation rate control 
functions. We address how such control functions may also be obtained nu- 
merically in Section [3j In Section [4], we develop theory to consider a fitness 
landscape as a memoryless communication channel between fitness values and 
distance from an optimal sequence. We introduce the ideas of local and weak 
monotonicity of a landscape. This allows us to formulate hypotheses about 
monotonicity and mutation rate control in biological fitness landscapes. We 
test these hypotheses by numerically obtaining optimal mutation rate con- 
trol functions for 115 published complete landscapes of transcription factor 
binding [30] . Our results presented in Section [5] show that all the optimal 
mutation rate control functions in these biological landscapes do indeed con- 
verge to non-trivial forms consistent with the theory developed here. We 
also observe differences among optimal mutation rate control functions, vari- 
ation that relates to variation in the landscapes' monotonic properties. We 
conclude in Section [6] by discussing how mutation rate control as considered 
here may be manifested in living organisms. 

2. A Generalisation of Fisher's Geometric Model of Adaptation 

In this section, we consider an abstract problem, in which organisms are 
viewed as points in some metric space and adaptation as a motion in this 
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space towards some target point (an optimal organism). In such formulation, 
maximisation of biological fitness corresponds to a minimisation of distance 
to the target, and geometry of the metric space allows us to solve the optimi- 
sation problem precisely. These abstract results will be used in the following 
sections to develop the theory further bringing it closer to biology. 

2.1. Representation and assumptions 

Let Q be a set of all possible organisms. Environment defines a prefer- 
ence relation < on Q (a total pre-order), so that a < b means b is better 
adapted to or has a higher replication rate in a particular environment than 
a. Throughout this paper we shall consider only the case of countable or 
even finite Q, although the theory can be easily extended with certain care 
to the uncountable case. It is well-known from game theory (e.g. [31]) that 
in the countable case the preference relation always has a utility representa- 
tion: there exists a real function / : Q — > M such that a < b if and only if 
f{ a ) — /(&)■ in the biological context, the utility function is called fitness, 
and it is usually defined to have non-negative values (i.e. if f(u) is the repli- 
cation rate of uS). Having positive fitness values is not essential, because the 
preference relation does not change under a strictly increasing transforma- 
tion of /, such as adding a constant e G E to / or multiplying it by a positive 
number (i.e. representation f(cu) is equivalent to \f(ou) + e for any A > 
and eel). Thus, our interpretation of fitness simply as a numerical rep- 
resentation of a preference relation on organisms is distinct from population 
genetic definitions of fitness (e.g. see [32J ) . We shall assume also that there 
exists a top (optimal) element T G fl such that /(T) = sup /(a;), which is 
the most adapted and quickly replicating specie in the current environment. 
Note that a finite set Q always contains at least one top (optimal) element 
T as well as at least one bottom element JL. 

Generally, one can consider also the set of all environments (including 
other organisms), because different environments 9 G Q impose different 
preference relations on Q, which have to be represented by different fitness 
functions f$(w) := f(u>,6). In this paper, however, we shall assume that 
a particular environment has been fixed, and therefore consider only one 
preference relation and one fitness function. 

During the replication, organism a can mutate into b with probability 
P(b | a), and the products P(b | a) ■ f(a) define the selection-mutation ma- 
trix — the infinitesimal generator of the replicator-mutator dynamics (gen- 
erally non-linear Markov evolution). Mutation can have different effects on 
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fitness of the offspring. Mutation a b can be deleterious, if f(a) > fib), 
neutral, if f(a) = f(b), or beneficial, if f(a) < f(b). We shall analyse how 
the probability of beneficial mutation can be related to the 'geometry' of 
mutation. 

Fitness is defined by the interaction of an organism with its environment, 
and therefore it is a property of a phenotype. Thus the set ft, which is the 
domain of the fitness function, can be thought of as not just the set of all 
organisms, but the set of all possible phenotypes. Reproduction of organisms, 
however, involves passing of information about the phenotypes in the form 
of codes, which can be elements of some other set. Consider a representation 
of phenotypes u G ft by points of a topological vector space H (e.g. a space 
of traits, a space of DNA sequences and so on). In information theory, a 
mapping k : ft — > % is called a code, and we shall assume here that it 
is uniquely decodable: k(o) = K,(b) implies a = b. That is, uj k{u) is 
an injection of ft into a possibly larger space H. In biological terms, each 
genotype has either one or no phenotype, and each phenotype has precisely 
one genotype. In addition, we shall assume that the image of k is closed 
under the operation of addition in H, which implies that for all a, 5 G ft, 
there exists c G ft such that k(cl) + k(c) = n(b). Thus, mutation a i— > b in ft 
can be represented in T-L by addition of codes k(o) and k(c) = n(b) — n(a), 
as shown on the following diagram: 

_ Mutation , ^ 



n 3 K(a) +K(C) > n(b) G H 

We shall assume that the topology in H is defined by a metric d : H x 
H — > [0, oo) (i.e. H is a metric vector space). Under a uniquely decodable 
mapping k, the metric on 7i induces an equivalent metric on Q representing 
'dissimilarity' of two phenotypes. Thus, abusing notation, we shall identify 
phenotypes u with their codes k(u) and write d(a, b) and b = a + c instead of 
d(K(a), n(b)) and nib) = k(cl) + k(c). A sphere and a ball of radius r G [0, oo) 
around every point a G Q is defined as usual: 



S(a,r) := {b G ft : d(a,b) = r} , B(a,r) := (J S(a, 

ne[0,r] 

If a mutates into b, then we call r = d(a, b) a mutation radius. 
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More generally, a representation may be non-uniquely decodable or even 
stochastic, in which case Q is not a metric space, but this will not be consid- 
ered here. Thus we consider a simplified picture of uniquely decodable geno- 
types. The motivation for distinguishing genotype and phenotype however 
will become apparent in Section [4] when we define the monotonic properties 
of general fitness landscapes. In particular, the radius r is the dissimilarity 
of the codes (e.g. genotypes) k(o) and n(b), and it depends on the choice of 
a representation space "H, its metric and the encoding-decoding schemes k 
and all of which may influence landscape monotonicity. 

2.2. Fisher's representation in Euclidean space 

In this section, we identify fitness f(u) with the negative distance — d(T, uj) 
from the top element, but later we shall generalise the relation between fit- 
ness and the topology of a representation space. Thus, adaptation (beneficial 
mutation) corresponds to a transition from a sphere of radius n = d(T, a) 
into a sphere of a smaller radius m = d(T,b), which is depicted in Figure [TJ 

T. 

... ;// 



Figure 1: Mutation of a into b by radius r = d(a,b). The distances n = d(T,a) and 
m = d(T, b) from an optimal element T define the fitnesses of a and b. The intersection 
of spheres S(a,r) and S(T,m) define the probability P{m \ n,r). 

This geometric view of mutation and adaptation is based on Ronald 
Fisher's idea [TJ, which was, perhaps, the earliest mathematical work on 
the role of mutation in adaptation. Fisher represented phenotypes by points 
of Euclidean space H, = M} of / G N traits, and therefore equipped Q with 
Euclidean metric <ie(a, b) = \\a — 6|| 2 (here || • ||2 denotes the standard £ 2 -norm 
in W). The top element T was identified with the origin in M 1 , and fitness 
f(cu) with the negative distance — g?e(T, uj). Then Fisher used geometry of 
the Euclidean space to show that probability of beneficial mutation decreases 
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exponentially as mutation radius increases, and therefore mutations of small 
radii are more likely to be beneficial. Despite subsequent development of the 
theory [2] , the use of Euclidean space for representation was not revised. 

Euclidean space is infinite, and the interior of any ball has always smaller 
volume than its exterior. Therefore, assuming mutation in random direc- 
tions, an organism on the surface of a ball around an optimum is always 
more likely to mutate into the exterior than the interior of this ball. This 
obvious and simple property is key for the conclusion that adaptation is more 
likely to occur by small mutations. Recently, we showed that the geometry 
of a finite space, such as the Hamming space of sequences, implies a different 
relation between the radius of mutation and adaptation [26, 27J. In partic- 
ular, mutation radius maximising the probability of adaptation varies as a 
function of the distance to the optimum. 

2.3. Probability of adaptation and representation in a Hamming space 

One of the most common examples of a finite metric space is a Hamming 
space of sequences. Let us denote by 1-L l a := {1, . . . , a} 1 the set of all sequences 
of letters from a finite alphabet {1, . . . , a} and length I. The alphabet can be 
equipped with operations of addition and multiplication such that it becomes 
a finite field GF(a) (a Galois field) and % l a becomes a linear algebra over 
GF(a). A linear algebra is also a vector space, and w a is an example of 
a finite vector space (there are a 1 points in V} a ). The space H l a can be 
equipped with the Hamming metric dH{a,b) := \{i : ^ b{\\ counting the 
number of different letters. The Hamming metric can also be defined as 
d H (a,b) := \\a — b\\ H , where || ■ \\h '■ H l a —> {0,1,...,/} is the Hamming 
weight counting the number of letters in a sequence not equal to the additive 
unit of the field GF(a) (zero of the field). Thus, addition of sequences results 
in a substitution of some letters, which corresponds to a simple mutation, 
and the Hamming distance counts the number of substitutions. 

Consider mutation of sequence a G S(T, n) into sequence b G S(T, m) by 
radius r = dn(a,b), as shown on Figure [I] Assuming equal probabilities for 
all sequences in the sphere S(a, r), the probability that the offspring sequence 
is in the sphere 5(T, m) is given by the number of elements in the intersection 
of spheres S(T,m) and S(a,r): 





(1) 



\S(a,r)\ 
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where | • | denotes cardinality of a set (the number of its elements). The 
cardinality of the intersection S(T,m) n S(a,r) with condition d(T,a) = n 
is computed as follows 



\S(T,m) n S(a,r)\ d{Tia)=n 



ro+r_+r+=min{r,m} 
r+— r_=n— max{r,m} 



r / V r_ / V r 



The summation runs over indexes r , r_ and r + satisfying conditions r + 
r_ + r + = min{r, m} and r + — r_ = n — max{r, m}. These conditions follow 
from the triangle inequalities for r, m and n, such as 



n — ml < r < n + m 



When r < m then ro, r_ and r + count respectively the numbers of neutral, 
deleterious and beneficial substitutions in r G [0,/]. They also satisfy the 
following constraints r_ G [0, [(r + m — n)/2\] and r + G [0, [(n — |r — m|)/2j], 
where |_"J denotes the floor operation. 

The cardinality of sphere S(a, r) C T-L l a is 

\S(a,r)\ = (a-iyQ (3) 

Equations ([l])- (J3j) allow us to compute the probability of adaptation in the 
Hamming space T-L l a , which is the probability that the offspring is in the 
interior of ball B(T,n): 

n—X 

P(m < n | n,r) = P(m \ n, r) (4) 

m=0 

Figure [2] shows the probability of adaptation in Hamming space H\ 00 (i.e. 
alphabet of size a = 4 and length I = 100) as a function of mutation radius 
r for different values of n — d(T,a). One can see that when n < 75 (more 
generally when n < 1(1 — l/a)), the probabilities of adaptation decrease with 
r, similar to Fisher's conclusion for the Euclidean space. However, for n = 75 
there is no such decrease, and when n > 75 (i.e. for n > 1(1 — l/a)), the 
probability of adaptation actually increases with r. This is due to the fact 
that, unlike Euclidean space, Hamming space is finite, and the interior of 
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Figure 2: Probability of adaptation P(m < n | n, r) in the Hamming space "H^ 00 as 
a function of mutation radius r. Different curves show P{m < n \ n,r) for different 
distances n — d#(T, a) of the parent sequence from the optimum T. 



ball B(T,n) can be larger than its exterior. The geometry of a Hamming 
space has a number of interesting properties (33] . For example, every point u 
has (a — 1)' diametric opposite points ->uj, such that dn{oj, ->uj) = I, and the 
complement of a ball B(u, r) in % l a is the union of (a — l) 1 balls B(-iu, I — 
r - 1). 

Remark (Representation space) . Using arbitrary alphabets is important not 
only because DNA molecules are sequences with a = 4 bases, but also be- 
cause it allows us to consider different representations, where the letters of 
the alphabet may correspond not to DNA base-pairs, but to higher-level 
structures such as triplets of DNA bases (encoding amino acids) or genes. 
Changing the representation by considering subsequences of a sequence as 
letters from an alphabet of a larger size a corresponds to decreasing the 
length / of the sequence. The Hamming metric, measuring the distance be- 
tween sequences, takes values in {0, . . . , I}, and changing the alphabet and 
length changes the geometry of the representation space H l a . 




Distance to optimum n = d(T, a) 



Figure 3: Optimal mutation rate control functions derived mathematically to minimise 
expected distance to optimum in Hamming space H\°. Different control functions are 
optimal solutions to different optimisation problems described in Section [2. 5| 

Remark (Variable lengths) . Hamming metric compares sequences of the same 
lengths, and it counts the least number of substitutions between a pair of 
sequences, which is the main mutation mechanism that we consider here. 
Variable length sequences can be compared using, for example, the Levensh- 
tain metric, which counts the least number of substitutions, insertions and 
deletions. The space of all variable length finite alphabet sequences is count- 
ably infinite, and it can be considered as a vector space over an extended 
Galois field [31]. Hamming spaces are finite subspaces of this space, and one 
can consider the set of nested Hamming spaces, where increasing complexity 
corresponds to an increasing sequence length [27]. We note also that every 
such finite subspace has a top element T, but the whole space of variable 
length sequences may fail to have one. 

2.4- Random mutation 

By mutation of parent sequence a into b we understand a random process, 
so that the mutation radius is a random variable. The simplest form of 



12 



mutation, called point mutation, is the random process of independently 
substituting each letter in a to any of the other a — 1 letters with probability 
\i. At its simplest, with one parameter, there is an equal probability \ij (a — 1) 
of mutating to each other base. Such mutation corresponds also to additive 
noise: b = a + c, where c is a sequence obtained by point mutation of the 
origin in % l a (the sequence with all letters equal to zero — the additive unit 
of the field GF(a)). The parameter // is called mutation rate. For point 
mutation, the probability of mutating by radius r G [0, /] is given by the 
binomial distribution: 

W *)=(') KnY(l-Kn)) l - r (5) 

The expected value and variance of the mutation radius are respectively 
E M {r} = l\i and a 2 {r) = — /i). Note in the equation above that we 
assume that mutation rate [i may depend on the distance n = dn(J~, a) from 
the top sequence. 

Optimisation of the mutation rate requires knowledge of the probability 
P^{m | n) that the offspring sequence b is in the sphere S(T,m) that can be 
expressed as follows: 

i 

P^(m | n) = P(m | n, r) P^{r \ n) (6) 

r=0 

Equations ([IJ-Q and ^ can be substituted into equation ^ to obtain the 
precise expression for transition probability P^m \ n) in T-L l a . 

Remark (Optimal mutation operator). Mutation in biology is much more 
complex than described above, and its precise mathematical modelling in- 
volves many parameters. One parameter point mutation, however, is optimal 
in a certain sense: it is a solution of one specific variational problem of min- 
imisation of expected distance between points a and b in a Hamming space 
subject to a constraint on mutual information between a and b. We define 



and solve this problem in Appendix C The optimal solutions are conditional 



probabilities having exponential form Pp(b \ a) oc exp[— f3 dn(a, &)], where pa- 
rameter (3 > 0, called the inverse temperature, is related to the constraint on 
mutual information. Because Hamming metric (i^(a,6) = \\a — b\\n is com- 
puted as the sum Y^i=i $ai(pi) of elementary distances S ai (bi) between letters 
cij and bi in ith position in the sequence, and the values of S ai (bi) do not de- 
pend on the position i, the exponential conditional probability factorises into 
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the product Pp(b \ a) oc Yli=i e~ /35a ^ 6i - ) corresponding to independent substi- 
tution of letters a, into fej with equal probabilities — 1), where /i is related 
to the inverse temperature (5. Changing the representation space H, and its 
metric will result in a different optimal mutation operation. For example, 
if % is the space of variable length sequences with Levenshtain metric, then 
optimal mutation Pp{b \ a) will involve independent substitutions, insertions 
and deletions. If elementary distances S ai (bi) are different between different 
pairs of letters, then there will be different parameters for different pairs. If 
elementary distances depend on the position i in a sequence or the metric 
d(a, b) is not the sum of elementary distances, then the optimal mutation 
is a more complex process with non-independent substitutions, insertions or 
deletions, the phenomenon known in biology as epistasis. 

2. 5. Optimal control of mutation rates 

The fact that we have shown above that the probability of adaptation 
depends on mutation rate introduces the possibility of organisms maximising 
the expected fitness of their offspring by controlling mutation rate. The exact 
form of the optimal mutation rate control functions depends on a number 
of factors, such as the time horizon. Here we cover the principal elements 
required, developed in [25] . 

Let Pt(a) be the distribution of parent sequences in H l a at time t, and let 
Pt(n) = ^ a . rf ( T a ) =n Pt(a) be the distribution of their distances n = dn(T, a) 
from the optimum. Transition probabilities P(m \ n) define linear transfor- 
mation of Pt{n) into distribution P t+ i(m) of distances m = dn(T, b) of their 
offspring from the optimum: 

I 

P t+ i{m) = ^P(m | n)P t (n) 

n=0 

If this linear transformation T(-) := Y2n=oP(, m I n )(') does not change with 
time and assuming that distance to the optimum has Markov property (i.e. 
distance at t + 1 depends only on distance at t, but not at t — 1, t — 2, etc), 
then the distribution P t+S (m) after s generations is defined by T s (-), the sth 
power of T(-). 

According to equation transition probabilities P^{m \ n) from sphere 
S(T,n) to S(T,m) depend on the mutation rate parameter /i for each dis- 
tance n from top sequence T, and we call the collection of pairs the 
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mutation rate control function n(n). The expressions for the transition prob- 
abilities P M (m | n) between spheres around optimal element T 6 H l a can be 
used to optimise this function. This optimisation, however, can be done with 
respect to different criteria leading to different optimal functions. For exam- 
ple, after one replication, the conditional expected distance to the optimum 
E{m | n} = Xl!n=o m -^( m I n ) * s mmrm ised if the mutation rate \i depends 
on n according to the following step function: 

f if n < 1(1- 1/a) 
fi(n) = I \ if n = 1(1 -1/a) (7) 
y 1 otherwise 

This function is shown on Figure [3] for Hamming space 1-L\ . The sudden 
change of the optimal mutation rate from /i = to // = 1 at n = 1(1 - 
1 /a) corresponds to the sudden change of the effect of the mutation radius 
on the probability of adaptation shown on Figure |2j If parent sequences 
are uniformly distributed Pt(a) = a~ l in 7-L l a , then mutation of sequences 
with this control function achieves the greatest decrease E{n} — E{m} = 
Yln=o n Pt{ n ) — X]L=o m Pt+i( m ) of the expected distance to the optimum. 
Note, however, that sequences with n = dn(T, a) < /(l — 1/a) do not mutate. 
Therefore, if after several generations all sequences are closer to T than 
1(1 — 1/a), then their offspring cannot get closer to T. In the space of binary 
sequences (a = 2) this occurs after only one replication. For this reason, 
the control of mutation by the step function is not optimal for adaptation in 
more than one generation. 

Deriving a mutation rate control function minimising the expected dis- 
tance to the optimum after several generations is not a trivial task. However, 
for a sufficiently large number of generations this problem is equivalent to 
minimising the expected time at which individuals achieve maximum fit- 
ness. The expected convergence times can be computed using techniques 
for absorbing Markov chains, and numerical methods show that the optimal 
mutation rate control changes in this case from a step to a more smooth, 
sigmoid-like function [28] . 

A simpler but closely related problem is maximisation of probability 
Pfi(b = T | a) of mutating directly to the optimum, or maximisation of 
the probability P M (m = | n), which has the following expression: 

P^(m = Q\ n ) = {a- l)~ n /i n (l - n) l ~ n (8) 
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Conditions dP^/d/i = and d 2 P fJj /dfi 2 < denning the mutation rate max- 
imising this probability lead to the equation n — Ifi = and the following 
linear mutation rate control function shown on Figure [3] for T-L\°: 

fi(n) = - (9) 

This variation of optimal control functions illustrates the importance of the 
number of generations (time horizon) for which the expected fitness is max- 
imised, as pointed out previously by Orr 

Another approach to mutation rate control is to maximise the probability 
of adaptation: 

n-l 

P M (m < n | n) — Pn(fn \ n) 

Back obtained the mutation rate function fi(n) maximising this probability 
(which he called the probability of success) in the space 1-L 2 of binary se- 
quences [21]. The expressions from the previous section allow us to obtain 
similar functions for general Hamming spaces T-L l a . Figure [3] shows this func- 
tion for T-L\°. We note that the comparison m < n used in the probability of 
adaptation and its maximisation effectively changes fitness from being abso- 
lute (i.e. depending only on an individual) to relative (e.g. depending also on 
the parent of an individual). Indeed, maximisation of P(m < n \ n) is equiv- 
alent to maximisation of the expected value E{/2(^, n) \ n} of a two- valued 
relative fitness function f2(m,n) — 1 if m < n; f'2(m,n) = otherwise. 

Another approach that we pursued elsewhere is based on information 
theory [271 [22]. In brief, the optimisation of expected fitness is performed 
subject to constraints on information divergence of distribution P t+1 (m) from 
distribution Pt{n). The resulting optimal mutation rates /x(n) correspond to 
cumulative probabilities Po(m < n) = Y^m=aPo( m ), where Po(m) is the 
distribution of m = d(T, a) assuming uniform distribution P(u) = a~ l of 
sequences in T-i l a . Figure [3] shows this function for Hi . We point out that 
this control not only achieves a very fast decrease of the expected distance 
E{m} to the optimum, but the resulting populations also have the smallest 
variance cr 2 (m) of the distances. 

There are other optimisation criteria, such as maximisation of cumulative 
expected fitness, that may lead to different optimal control functions. Thus, 
Figure [3] and this discussion illustrates the fact that there is no single optimal 
mutation rate control function, but a variety of functions each of which solves 
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a specific optimisation problem. However, it is also evident from Figure [3] 
that all these control functions have a common property of monotonically in- 
creasing mutation rate with increasing distance of parent sequence from the 
optimum. Where an evolutionary system optimises a particular criterion, 
such as one of those considered in this section, on a monotonic landscape, 
the optimal mutation rate control function will be the corresponding derived 
function. In Section [3] we shall consider an approximation technique appli- 
cable to a more general class of problems including cases where derivation is 
impractical. In Section [4] we relax the assumption of a monotonic landscape. 

3. Evolutionary Optimisation of Mutation Rate Control Functions 

Analytical approaches cannot always be applied to derive optimal mu- 
tation rate control functions due to high problem complexity. Another ap- 
proach is to use numerical optimisation or evolutionary techniques to obtain 
approximately optimal solutions. In this section, we introduce such an evolu- 
tionary technique that uses two genetic algorithms. The first, which we refer 
to as the Inner-GA, evolves sequences with the mutation rate controlled by 
some function that maps fitness to mutation rate. The second, which we refer 
to as the Meta-GA, evolves a population of such mutation rate control func- 
tions for better performance of the Inner-GA. In this section, we describe the 
details of these algorithms and report results of experiments. The Inner-GA 
can use any fitness function. First, we shall apply the techique to the case 
when fitness of an individual is its negative distance from a selected point 
in a Hamming space. Later we shall apply the technique to more general 
non-monotonic fitness landscapes. 

3.1. Inner-GA 

The Inner-GA is a simple generational genetic algorithm that uses no 
selection and no recombination. Each genotype in the Inner-GA is a sequence 
oj G H l a , and we used populations of 100 individuals. The initial population 
had equal numbers of individuals at each fitness value, and they were evolved 
by the Inner-GA for 500 generations using simple point mutation, according 
to a mutation rate control function specified by the Meta-GA. The fitness 
can be defined by an arbitrary real function y = f(u>), and the average fitness 
y(t) of the population is calculated at each generation, in order that expected 
fitness E{y}(£) may be maximised by the Meta-GA. 
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Figure 4: Means and standard deviations of mutation rates for the fittest control functions 
evolved in each of 20 runs of the Meta-GA using Inner-GAs with individuals in V.4 evolved 
to minimise expected distance to the optimum after 500 generations. 



3.2. Meta-GA 

The Meta-GA is a simple generational genetic algorithm that uses tour- 
nament selection (a good choice when little is known or assumed about the 
structure of the landscape). Each genotype in the Meta-GA is a mutation 
rate function fj,(y) of fitness values y. The domain of fi(y) is an ordered par- 
tition of the range {y : f(u) = y, u G "H^} of the Inner-GA fitness function. 
Thus, individuals in the Meta-GA are sequences of real values ji G [0, 1] 
representing probabilities of mutation at different fitnesses, as used in the 
Inner-GA. 

At each generation of the Meta-GA, multiple copies of the Inner-GA were 
evolved for 500 generations, with the mutation rate in each copy controlled 
by a different function /i(y) taken from the Meta-GA population. We used 
populations of 100 individual functions, which were initialised to /i(y) = 0. 
All runs within the same Meta-GA generation were seeded with the same 
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initial population of the Inner-GA. The Meta-GA evolved functions fi(y) for 
5 ■ 10 5 generations to maximise the average fitness y(t) « E,{y}(t) in the final 
generation of the Inner-GA. 

The Meta-GA used the following selection, recombination and mutation: 

• Randomly select three individuals from the population and replace the 
least fit of these with a mutated crossover of the other two; repeat with 
the remaining individuals until all individuals from the population have 
been selected or fewer than three remain. 

• Crossover recombines the start of the numerical sequence representing 
one mutation rate function with the end of another using a single cut 
point chosen randomly, excluding the possibility of being at either end 
so that there are no clones. 

• Mutation adds a uniform-random number A/i 6 [— .1, .1] to one ran- 
domly selected value fi (mutation rate) on the individual mutation rate 
function but then bounds that value to be within [0, 1]. 

The Meta-GA returns the fittest mutation rate function fi(y). 

3. 3. Evolved control functions 

The kind of mutation rate control function the Meta-GA evolves depends 
greatly on properties of the fitness landscape used in the Inner-GA. In Sec- 
tion 2.5 we showed theoretically that for f(u>) corresponding to negative 
distance to optimum — d#(T,u;), the optimal mutation rate increases with 
n = cIh(T,u}). Therefore, the population of mutation rate functions in the 
Meta-GA should evolve the same characteristics in such a landscape. Fig- 
ures [4] shows the average and standard deviations of the fittest control func- 
tions evolved in 20 runs of the Meta-GA using Inner-GAs with individuals 
in Til (i.e. a — 4, I = 10) and fitness defined by f(u) = — dn(T,u)). As 
predicted, the mutation rate increases with n = dn{T,uj). We shall now 
consider more complex landscapes. 



4. Locally and Weakly Monotonic Fitness Landscapes 

The logic behind the variation and optimal control of mutation rates de- 
scribed in the previous section was based on the assumption that fitness 
f(ui) is isomorphic to negative Hamming distance — d#(T,u;) from the top 
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sequence, which allowed us to derive optimal control functions using the ge- 
ometry of the space of sequences. As detailed below, this assumption implies 
global monotonicity of the fitness landscape, and it is highly unlikely in real 
biological landscapes, which can be rugged (35]. In this section, we define 
the concept of a local and weak monotonicity relative to a chosen metric 
and show that all landscapes are weakly monotonic at least in some small 
but non-trivial neighbourhood of the top sequence. This relation between 
fitness and distance allows one to implement a control of mutation rate using 
feedback from fitness values. We then consider how monotonicity of different 
landscapes may influence these fitness-based optimal control functions. 

4-1. Memoryless communication between fitness and distance 

If fitness y = f(u) is not isomorphic to the negative distance n = dn(T, oj) 
from the optimum, then fitness values of the sequences do not provide full in- 
formation about their distances. Thus, in order to employ the optimal control 
n(n) of mutation rate based on the distance from the top sequence, one has 
to estimate the distance from fitness values. The estimation of unobserved 
random variable n = dH(T,u t ) at time t from a sequence y t , yt-i, ■ ■ ■ , Vo 
of observed random variables is known as the filtering problem [36]. Note 
that generally the observed process {yt}t>o is not Markov (i.e. P{y t+1 \ 
y t ,...,y ) 7^ P{yt+i I Vt)), even if the unobserved process {n} t > and the 
joint process {(n, yt)}t>o are - F° r this reason, the optimal control of mutation 
rate should be a function fi(yt, ■ ■ ■ ,yo) of the entire history of observations. 
It seems unlikely, however, that such a control has biological relevance, as its 
implementation would require information about fitness values in all previous 
generations. Instead, we shall consider a control based only on the current 
fitness value y t . Our analysis will focus on monotonic properties of the fit- 
ness landscape that will allow us to relate transition probability P^(y t +i \ yt) 
between fitness values of the parent and offspring with probability P^m | n) 
of transitions between spheres of different radii around the optimum. We 
shall demonstrate that the 'similarity' between these transition probabili- 
ties increases as sequences evolve closer to the optimum, and for this reason 
the optimal control function /i(?/t) based on the current fitness values should 
closely resemble the distance-based optimal control function /i(n) in some 
neighbourhood of the optimum. 

By a fitness landscape, we understand it to mean a graph of a function 
/ o k~ 1 : T-L l a — > M which associates representations k(uj) G H (codes) of 
individuals with their fitness values y = f{u>). The landscape defines a joint 
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distribution P(y, n) of the fitness values y = f(u>) and distances n = d H (T, u) 
from the nearest global optimum. This joint distribution defines conditional 
probabilities P{n \ y) and P(y \ n). Let us consider mutation of sequence a 
into sequence b, and let us denote by n = d H (T,a) and m = dn(T,b) their 
distances from the nearest optimum and by y t = f(a) and y t+ i = fib) their 
fitness values. Thus, given sequence b, its fitness and distance values y t+ \ 
and m are independent of the parent sequence a. We shall assume further 
that given distance m, the fitness value y t +i is also independent of distance 
n: P(yt+i \ m,n) — P(yt+i \ m). One can show that this is equivalent to 
conditional independence of y t+ i and y t given distances m and n: P(y t+1 , y t \ 
m,n) = P(yt+i \ rn) P(y t \ n). The transition probability P^(y t+ i \ yt) in this 
case can be expressed as a composition of transition probabilities P{n \ y t ), 
Pfj,(m | n) and P(yt+i \ m) in the following way: 

i i 

P»(yt+i \yt) = ^2^2 p (yt+i I m)P fl (m | n)P(n \ y t ) 

m=0 n=0 

Thus, we assume that the fitness landscape acts as a memoryless communica- 
tion channel between distances of individuals to the nearest global optimum 
and their fitness values. The amount of information communicated through 
this channel defines how 'similar' the conditional probabilities P^yt+i \ yt) 
and P^m \ n) are and how effective a mutation control function /j,(y) is. 

If fitness values y = f(u) of sequences and their distances n = dniT^u) 
from the nearest global optimum are statistically independent, then P(n | 
y t ) = P(n), P(y t +i | m) = P{y t +i) and therefore P^yt+i \ y t ) = P{Vt+i)- 
This means that fitness y t+ \ of the offspring is independent of fitness y t of its 
parent, and therefore a control of mutation rate will have no effect on fitness 
of the offspring. On the other hand, if there is a one-to-one correspondence 
between the fitness values y = f(u) and distances n = dH(T,u) (i.e. there 
is a bijection g : E — > WL such that f(u) = g o d H (T,u) and g" 1 o f(uj) = 
d H (T,u)), then P^{y t+ \ \ y t ) = P^ijn = g' l {y t+ i) \ n = g^(y t )), and 
the optimal mutation rate control function is /i o g~ 1 (y), where fi(n) is an 
optimal control function obtained using P At (m | n). In particular, the identity 
f(cu) = —dH(T,Lj) used in previous section is established by g(-) = — 1 x (•). 



In Appendix A|the relationship between transition probabilities P(y t +i \ yt) 



and P{m \ n) is explained in more detail. 
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4-2. Monotonicity of fitness landscapes 

Let us consider landscapes in which fitness and distance to nearest global 
optimum are not isomorphic but there is a deterministic mapping between 
them. Moreover, we shall consider monotonic properties of these mappings, 
which allow us to clarify notions of 'smooth' or 'rugged' fitness landscapes, 
used in biological literature. Note that these monotonic properties are rel- 
ative to (i.e. depend on) the choice of a representation space, its metric d 
and encoding-decoding scheme. Below we introduce the definitions of various 
monotonic properties of landscapes which later allow us to analyse rugged 
biological landscapes and address optimal control of mutation rate in such 
landscapes. 

Definition 1 (Local monotonicity). Let (Q,d) be a metric space, and let 
/ : Q — > R be a real function. Then, if all a and b inside some ball B(u, 5) 
satisfy the properties below, we say that: 

• d is locally monotonic relative to / at u if: 

-d(u,a)<-d(u,b) <= f(a)<f(b) 

• f is locally monotonic relative to metric d at u if: 

-d(u,a)<-d(u,b) =s> f(a)<f(b) 

• f and d are locally isomorphic at u if: 

-d{u,a)<-d{u,b) <=► f(a)<f(b) 

• We say that d or / are globally monotonic (isomorphic) at T relative 
to each other if the relevant property holds over B(T, 5) = Q. 

The three monotonic relations between fitness and distance defined above 
are illustrated on Figure |5j These cases represent idealised situations, but 
they help in understanding the properties of real and biologically relevant 
landscapes. Let us first consider global monotonicity, that is when the mono- 
tonic properties hold for the entire Q. 

The monotonic relationships between distance d(T,u) and fitness f(uj) 
can be represented by real monotonic functions h : R — >■ R and g : R — > R 
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Figure 5: Schematic representation of monotonic properties described in Definition [T] 
Abscissae represent sequence space, ordinates represent fitness, a) Distance to optimum 
is monotonic relative to fitness (fitness landscape can have 'cliffs'); b) fitness is monotonic 
relative to distance to optimum (landscape can have 'plateaus'); c) fitness and distance to 
optimum are isomorphic (neither cliffs nor plateaus are allowed). 



such that h o f{uj) = d(T,u) and g o d(T,u>) = f(u). These mappings are 
shown in the commutative diagrams in Figure |6j It is clear from the dia- 
grams that mappings h and g are adjoint to encoding k and decoding k^ 1 
schemes. Thus, for these diagrams to commute, these mappings as well as the 
representation space with its topology must satisfy certain properties. This 
represents the fact that monotonicity of fitness and distance (i.e. monotonic- 
ity of h and g) is relative to the choice of a representation space, its metric 
d and encoding-decoding scheme. 



<) 
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Figure 6: Commutative diagrams connecting the preference relationship among pheno- 
types (upper left), fitness values (upper right), distances from the top sequence (lower 
right) and the representation space (lower left). The arrows give the functions relating 
these sets including the fitness/inverse-fitness function (top) and encoding-decoding func- 
tion (left). The landscape is the mapping from the representation space (lower left) to 
the fitness values (upper right). The diagrams show that the relationships h and g be- 
tween fitness and distance to the optimum impose certain properties on the metric d, 
the representation space and the encoding-decoding scheme. Note that d(T,a) is used as 



short-hand for d(re(T), k(o)); see Section 2.1 



If the metric d is monotonic relative to fitness /, then the distance to 
optimum is overdetermined, because there are generally more fitness values 
f(cu) than spheres S(T,n) around the optimum (see Figure|5^). This follows 
directly from the fact that in this case sequences with the same fitness must 
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have the same distances to the optimum, but not necessarily vice versa (see 
Proposition |2~| in |Appendix B ). Transition probabilities P(yt+i \ yt) between 



fitness values are easily determined by transition probabilities P{m \ n) be- 
tween spheres around T and monotonic function h o f(u) = d(T,oo) (see 
Proposition [T] in |Appendix A ) : 



P(Vt+i I Vt) = Tj^i — T7 rr P(m = h(y t +i) | n = h(y t )) 

\h o h(y t+ i)\ 

where h~ l (y) := {x : h(x) = y} is the pre-image of y, and cardinality 
{h^ 1 o h(y)\ > 1 represents degeneracy of the mapping h (i.e. the number of 
fitness values at the same distance from T as y). Thus, generally P{y t+ \ \ 
y t ) < P(m = h{y t+ i) \ n = h(y t )), when distance to optimum is monotonic. 
In addition, it is easy to show that in the case of a globally monotonic metric 
there can be only one optimal element. Indeed, applying the definition to Ti 
and T2 we have: 

/(Ti) = /(T 2 ) =^ c /(T 2 ,T 1 ) = rf(T 2 ,T 2 ) = ^ T 1 = T 2 

The case of distance being overdetermined has little practical interest for 
our theory. In addition, this property does not allow for fitness plateaus as 
can be seen from Figure |5^i. Such plateaus may be important in biology 
|37j . It is therefore particularly interesting to look at the case where / is 
monotonic to d, which allows for plateaus. In this case distance to optimum is 
underdetermined, because there can be fewer fitness values f(u>) than spheres 
S(T, n) around the optimum (see Figure [5Jd) . It follows directly from the fact 
that in this case sequences with the same distance from the optimum must 
have the same fitness values, but not necessarily vice versa (see Proposition^ 
in Appendix B). Transition probabilities P(yt+i \ yt) between fitness values 
can be computed from transition probabilities P(m \ n) between spheres 
around T and monotonic function g o d(T,cu) = f(u) (see Proposition [l] in 



Appendix A): 



n 



One can see that the relation between two transition probabilities is more 
complicated than in the previous case, and captures a model of 'noisy' com- 
munication between fitness and distance simply in the mapping g. The 
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amount of noise in this case depends on the average degeneracy of the map- 
ping g, represented by the average number of distance values \g~ l {y)\ cor- 
responding to each fitness value y = f{oo). The extreme case is a constant 
fitness function, which has only one value so that all sequences are optimal. 
A non-trivial example of a highly degenerate landscape is a Boolean land- 
scape, where fitness can have only two values, a situation close to many in 
biology where a single, non-lethal aspect of the environment is critical for 
determining fitness (e.g. a nutrient that either can or cannot be utilised, an 
absent vitamin that is or is not required or, resistance or not to a pathogen 
or stressor). We now combine the results obtained in Section [2] with those 
in this section to derive transition probabilities between fitness values on 
this Boolean landscape where fitness is not isomorphic to distance as in the 
landscapes used in Section [2] and how this leads on to optimal mutation rate 
control even in this degenerate case. 

Example 1 (Boolean landscapes). Boolean fitness landscape is defined by 
f(cu) — 1 if cj = T; f(u) = otherwise. There can be multiple optima 
T G Q with /(T) = 1, and the domain is partitioned into two disjoint 
subsets f-\l) = {u : f(u) = 1} and /^(O) = {u : f(u) = 0}. Because 
there are only two fitness values, there are only four transition probabilities 
P(yt+i I Vt) between them, the most important of which for optimisation 
purposes is probability P{y t +i — 1 | y% — 0). This probability is related to 
probability P(oj t +i \ Wt) of transitions between any two points u t , 0J t +i G Vt 
in the following way: 

P(y t+ i = l\y t = 0) = — i— E p ("t + x\u t ) 

u 1 n Wt+1 e/- 1 (i)^e/- 1 (0) 

One can see that the size of subsets / _1 (1) and / _1 (0) relative to each other 
plays an important role, and this characteristic can be used to study different 
types of Boolean landscapes. When u are represented by sequences in a 
Hamming space HL, the probability P(ui t+1 \ oj t ) with dg(u t +i,iO t ) = n is 
given by equation (JsJ) : P M (aj t +i | w*) = (a — l)~ n fi n (l — fi) l ~ n . This expression 
can be used to maximise the transition probability above by optimising the 
mutation rate /i(0). 

4-3. Weak monotonicity 

Generally, fitness landscapes may have different local monotonic proper- 
ties, described above, and the relationship between fitness and distance to 
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an optimum may not be described by any function, but rather it is non- 
deterministic, described by conditional probabilities P(n \ yt) and P{y t+ i \ 
m). In this case, we can still define monotonicity in a weak sense (i.e. on 
average) using conditional expected fitness values within spheres of a given 
radius from point lo: 

Definition 2 (Weak local monotonicity). Let (Q, d) be a metric space, and 
let / : Q — > 1R be a real function. Then we call / weakly locally monotonic 
at u relative to metric d if there exists a ball B(u, 5) such that for all a, b 
within this ball, the following condition holds: 

-d(u, a) = -n< -d(u, b) = -m E{/ | n} < E{f \ m} 

It is not difficult to show that all fitness landscapes are weakly and locally 
monotonic at T. To see this, assume the opposite, that E{/ | n} > E{/ | m} 
holds for all neighbourhoods of T. Then clearly sup f{uj) cannot be attained 
at T (i.e. T is not the optimum). Thus, there must be some neighbourhood 
B(T, 5), containing elements other than T, where weak monotonicity holds. 
Our analysis in Section [5] suggests that biological landscapes may exhibit 
weak monotonicity in large neighbourhoods of the optimum. 

As discussed previously, if / is locally monotonic relative to d, then 
spheres S(T, S) cannot contain elements with different values y = f(u)). This 
is not true in the case of weak monotonicity. The variation of fitness within 
the spheres S(T,n) can be measured by the conditional variance of fitness: 

1 V ' 71 tu:d{T ,uj)=n 

Clearly, stronger monotonicity implies smaller variance <J 2 (f | n). It is not 
difficult to see that an increase of expected fitness E{/ | n} — > /(T) coincides 
with a decrease of the variance cr 2 (f \ n) — y 0. Because of these weak 
locally monotonic properties of general fitness landscapes, the probabilities of 
transitions P(y t +i \ yt) between fitness values that are close to the optimum 
Vti yt+i > /(~0 — e will be similar to transition probabilities P(m | n) 
between spheres with n, m = d(T,u) < 5. Therefore, we formulate the 
following hypotheses: 
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Hypothesis 1 Optimal mutation rate increases with a decrease in fitness 
in some neighbourhood of an optimum for realistic fitness landscapes 
(e.g. biological landscapes) where fitness is not isomorphic to distance, 
similar to the monotonic increase in optimal mutation rate derived for 
the isomorphic case. 

Hypothesis 2 Real and biological landscapes exhibit weak monotonicity in 
large neighbourhoods of an optimum. 

Hypothesis 3 The larger the neighbourhood of weak monotonicity, the 
more mutation rate control may contribute to evolution towards high 
fitness. 

5. Evolving Fitness-Based Mutation Rate Control Functions 

To test the relevance of our predictions about the optimal mutation rate 
control functions more widely in biologically realistic sequence-fitness land- 
scapes, we used the described earlier Meta-GA technique (see Section [3]) 
to evolve approximately optimal functions for 115 published complete land- 
scapes of transcription factor binding [30]. Transcription factors have evolved 
over very long periods to bind to specific DNA sequences. The landscapes 
show experimentally measured strengths of interaction (DNA-TF binding 
score) between the double-stranded DNA sequences of length I = 8 of base 
pairs each and a particular transcription factor. Because these landscapes 
represent results of direct interaction between the DNA sequences and the 
transcription factors, the DNA sequences can be thought of as both 'pheno- 
types' and their codes, which allows us to identify the space Q of phenotypes 
with the representation space, which in this case is the Hamming space 
(a — 4, I = 8). The DNA-TF binding score, however, which plays the role 
of fitness, is clearly not identical to the negative Hamming distance of a 
sequence from the top sequence (a sequence with the maximum DNA-TF 
binding score). In this section, we show that the mutation rate control func- 
tions obtained for these landscapes using evolutionary technique conform well 
to our theoretical predictions about the optimal mutation rate control. 

5.1. Evolved control functions 

We used the Meta-GA evolutionary optimisation technique, described in 
Section |3j to obtain for each landscape the best possible mutation rate control 
function that maximises the average DNA-TF binding score in the population 
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(expected fitness) after 500 replications. The Meta-GA algorithm converged 
within a small margin of statistical error to a specific mutation rate control 
function in each landscape. To get sufficiently significant results as well as 
an estimate of the convergence, 16 replicate runs were performed in each of 
the 115 transcription factor landscapes. 

Figure [7] shows the average values and standard deviations of the evolved 
mutation rates for three transcription factors: Srf, Glis2 and Zfp740. Evolved 



functions for all landscapes are shown on Figure D.10 in supplementary ma- 
terial. One can see that the evolved functions for each transcription factor 
landscape is monotonic in the direction predicted: close to zero mutation at 
the maximum fitness, rising to high levels further from the maximum fitness 
value. Once the mutation rate has peaked near the maximum value fi = 1, 
the mutation rates tend to decrease and become chaotic. As will be shown 
in the next section, this occurs at lower fitness values at which the landscape 
is no longer monotonic (i.e. further from the peak of fitness). Small stan- 
dard deviations indicate good convergence to a particular control function. 
Observe that there is poor convergence at low fitness areas of the landscape 
that are poorly explored by the genetic algorithm. 

5.2. Landscapes for transcription factors 

The variation in the evolved mutation rate control function is clearly 
related to a variation in the properties of the landscapes. Our theoretical 
analysis suggests that the main property affecting the mutation rate control 
is monotonicity of the landscape relative to a metric measuring the mutation 
radius. In particular, the radius of point-mutation is measured by the Ham- 
ming metric, and we shall look into the local and weak monotonic properties 
of the transcription factors landscapes relative to the Hamming metric. 

Figure [8] shows average DNA-TF binding scores within spheres S(T, n) 
around the optimal sequence as a function of Hamming distance n = dnO~, u) 
from the optimum. Data is shown for three transcription factors: Srf, Glis2 
and Zfp740. Lines connect average values at discrete distances for visualisa- 
tion purpose. Errorbars show standard deviations of the DNA-TF binding 
scores within the spheres. Distributions of fitness with respect to Hamming 



distance (i#(T,a;) for all 115 transcription factors are shown on Figure D.ll 
(supplementary material). 

One can see from Figure [8] that the landscape for the Srf factor has 
monotonic properties: The average values increase steadily for sequences 
that are closer to the optimum, and the deviations from the mean within the 
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Figure 7: Examples of GA-evolved optimal mutation rate control functions. Data are 
shown for the transcription factors Srf, Glis2 and Zfp740. Each curve represents the aver- 
age of 16 independently evolved optimal mutation rate functions on a particular transcrip- 
tion factor DNA-binding landscape |30j . Errorbars represent standard deviations from the 
mean. Similar curves for all 115 landscapes are shown in supplementary Fig. |D.1Q) The 
arrows indicate the edge of monotonicity e, that defines an interval of fitness values below 
the maximum, where mutation rate monotonically increases. 



spheres are relatively small. This is in contrast to the other two landscapes. 
We note also that the average values for Glis2 decrease quite sharply around 
the optimum, while the landscape for Zfp740 has a relatively flat plateau 
area around the optimum, which means that there are many sequences with 
high DNA-TF binding score. This difference may explain different gradients 
of optimal mutation rates near the maximum fitness shown on Figure [7} 

5.3. Monotonicity and controllability 

Our results have confirmed that the evolved optimal mutation rates rise 
from zero to very high levels as fitness decreases from the maximum value 



/(T) to some value /(T) — e (see supplementary Fig. D.10). We refer to 



the corresponding value e > as the monotonicity radius, as it defines the 
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Figure 8: Examples of fitness landscapes based on the binding score between DNA se- 
quences and transcription factors (TF) from |30) . Data are shown for the transcription 
factors: Srf, Glis2 and Zfp740. Lines connect mean values of the binding score shown as 
functions of the Hamming distance from the top sequence (a sequence with the highest 
DNA-TF binding score). Errorbars represent standard deviations. Similar curves for all 



115 landscapes are shown in supplementary Fig. D.ll 



neighbourhood of T in terms of fitness values in which the evolved muta- 
tion rate control function has monotonic properties. We find substantial 
variation in monotonicity radius among transcription factors (see Fig. [7] and 
supplementary Fig. D.10) 

We hypothesised that the variation in the optimal mutation rate control 
functions relates to variation in the monotonicity of the transcription factor 
landscapes. Various measures have been proposed for the roughness of bio- 
logical landscapes Here we focus on the Kendall's r correlation, which is 
directly concerned with monotonicity; specifically, r measures the proportion 
of mutations that, in moving closer to the optimum in sequence space, also 
increase in fitness. As shown in Figure [9j we find that r of the landscape 
does indeed have a relationship with the monotonicity radius e of the evolved 
mutation rate control functions (Spearman's p = 0.77, P ~ 10~ 16 , N = 115). 
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Figure 9: Linear relation between monotonicity of the landscapes measured by the 
Kendall's r correlation (ordinates) and the edge of monotonicity e (abscissae) of the cor- 
responding evolved mutation rate control functions. Three labels show data for three 
transcription factors shown in Figs. [7] and [8j 



Finally, we hypothesise that these related features of the transcription 
factor landscape and mutation rate function themselves relate to the biologi- 
cal evolution of this transcription factor system. To test this we looked at the 
evolutionary age of transcription factor families [38] • We find the suggestion 
of a relationship between the monotonicity of a landscape (r) and the age of 
the transcription factor family, implying that the more recently a transcrip- 
tion factor family evolved, the more monotonic is its landscape (Spearman's 
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p = 0.23, P = 0.061, iV = 115). We find a more substantial relationship 
between this evolutionary age and the monotonicity radius e (Spearman's 
p = 0.36, P = 0.0032, N = 115). 

6. Discussion 

In this paper we have developed and tested theory relating to the control 
of mutation rate in biological sequence landscapes. To do so, we had to move 
the theory closer to the biology in three ways. Firstly (in Section[2]), we gener- 
alised Fisher's geometric model of adaptation, from its Euclidean space (con- 
tinuous and infinite) to discrete, finite Hamming spaces of sequences. Doing 
so demonstrated that, in contrast to the behaviour in Euclidean space, where 
the probability of beneficial mutation behaves similarly at different distances 
from the optimum [39J, the probability of beneficial mutation, for a given 
mutation size, varies markedly depending on the distance from the optimum 
(Figure [2]). Secondly, we analytically derived functions for optimal control of 
the mutation rate minimising the expected Hamming distance to a particular 
point (optimal sequence). We demonstrated also a variation of these control 
functions dependent on specific formulations of the optimisation problem. 
Nonetheless we observed consistency: all optimal functions increase mono- 
tonically (Figure [3]). Thirdly, we developed theory concerning locally and 
weakly monotonic landscapes, demonstrating that all possible landscapes, 
including biologically rugged landscapes, can be included in these categories 
and thus that, at some level, our theoretical findings regarding mutation rate 
control may be applied to realistic biological landscapes. The most striking 
differences from existing theory in Euclidean spaces occur when sequences are 
short and far from a peak. We therefore used transcription factors binding to 
DNA sequences [30J as a test case, which involves both short sequences (eight 
base-pairs) and highly evolved binding specificity (i.e. we expect that many 
sequences will bind much more weakly than the best). We tested hypotheses 
arising from the theory, relating to the nature of optimal mutation rate func- 
tions (Hypothesis 1 and Figure [7]), the monotonic properties of landscapes 
(Hypothesis 2 and Figure [8]) and the relationship between the two (Hypothe- 
sis 3 and Figure [9]). In each case we find evidence to support the hypothesis, 
implying that our theory is relevant to these biological landscapes. 

We have considered the possibility of varying the general mutation rate for 
a single genotype, that is mutation rate plasticity, and identified forms that 
such plasticity may be expected to take as a function of fitness in biological 
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fitness landscapes. This raises a number of important questions about how 
this theory might relate to living organisms. The primary question is whether 
such control of mutation rate plasticity actually occurs in nature. Variation 
in mutation rate is well known, and organisms with a genetically encoded 
raised mutation rate, termed mutators, are found at appreciable frequencies 
in various real populations, apparently via their association (that is 'hitch- 
hiking') with beneficial mutations [UJJHT]. Mutation rate plasticity is a more 
subtle effect than simply being a mutator. However, as with the evolution of 
mutators, for mutation rate control to have evolved at all might be expected 
to require so-called 'second-order selection', that is selection not directly on 
a trait's effect on an individual's fitness, but indirectly via the genetic effects 
it produces [42J. While rare, there are clear examples of second-order selec- 
tion occurring in biology [13], and in our more abstracted system of genetic 
algorithms we do see mutation rate plasticity rapidly evolving to particular 
forms (Figure [7]). This implies that mutation rate control of the sort we have 
considered may reasonably be hypothesised in biology. 

Most existing discussion of mutation rate plasticity in nature concerns 
the observed phenomenon of stress-induced mutagenesis [4"4"] . It has long 
been postulated, and most recently argued from a population genetic model 
[15], that such plasticity might indeed be adaptive. Such adaptationist hy- 
potheses for stress-induced mutation have been subject to protracted debate 
[4"5] . but here are present two difficulties. First, it is necessary to exclude 
alternative, non-adaptive hypotheses. For instance, it seems likely that a 
raised mutation rate could be a physiologically unavoidable direct effect of 
stress. This has long been speculated, for instance Muller remarked that 
the kinetics of temperature's effect on mutation rate resembles that of an 
ordinary chemical reaction [IB]. Second, there needs to be a connection be- 
tween the imposed or measured variable, stress, and the variable considered 
by the theory, (inverse) fitness. The first difficulty is ameliorated by the 
development of explicit theory around non-adaptive hypotheses of mutation 
rate variation [13]. However, this population genetic theory is currently de- 
fined as an alternative to physiological hypotheses of mutation rate variation, 
whereas real organisms experience both physiological and population-genetic 
constraints. Integrating the two would help understand what might be ex- 
pected in terms of stress-induced mutagenesis without recourse to adaptive 
hypotheses. Regarding the second issue, the connection between stress and 
fitness, 'stress', as typically defined, can be difficult to separate from 'normal' 
physiological processes [IT] . This means that stress is not a simple inverse 
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of fitness. Indeed, stress may actually be associated with increased fitness 
(e.g. in the phenomenon of hormesis [IE])- Therefore, while it is possible 
that stress-induced mutagenesis is an example of mutation rate control as 
discussed in this paper, further work is required to clarify how exactly the 
theory relates to that example and perhaps to look for new examples of 
mutation rate control. 

Given the current uncertainties about the existence of mutation rate con- 
trol in nature, it is important to ask whether, nonetheless, mechanisms exist 
whereby the processes discussed in this paper could be exercised. The very 
existence of mutator pheno types demonstrates that, physiologically, increas- 
ing mutation rates from the low values typical of biology is possible. If it 
is possible via genetic change in mutators, it seems highly likely also to be 
possible in a controlled way via plastic changes. Indeed, several different 
mechanisms for modulating mutation rates have been proposed, notably by 
regulating particular DNA repair mechanisms (IHl EQl EE] or U P regulating 
mutagenic repair [521 ESI El] • 

While regulation of mutation rate is mechanistically feasible, a more chal- 
lenging issue for the relevance of the theory presented here is whether feed- 
back mechanisms exist for an individual organism to assess its own fitness 
against which to set its mutation rate. Stress is one indicator that may be 
assessed by an individual and is known to induce regulatory responses (e.g. 
the SOS response in bacteria [55]). but as discussed above, stress may not be 
a clear indicator of fitness. We propose three possible alternative feedback 
mechanisms, assessing either absolute or relative fitness. Absolute fitness 
is the scale used in the theory developed in this paper and concerns the 
number of offspring left in subsequent generations. For some organisms it 
may be possible to assess absolute fitness by assessing their own reproduc- 
tive period relative to an internal or external clock. It is notable that one 
of the best characterised examples of stress-induced mutation [TJ] actually 
relates to mutagenesis in ageing bacterial colonies (MAC) and ageing may 
be an appropriate biological clock for this mechanism, one that is known to 
be associated with mutation rates in human males [55] . Secondly, for organ- 
isms with limited dispersal rates, the number of live organisms of the same 
genotype in the near vicinity may be a proxy for absolute fitness. Thirdly, 
while the fitness scale we have worked with is absolute, we have demonstrated 
elsewhere [261 121] that appropriate mutation rate functions may be approx- 
imated by the cumulative distribution function of the population fitnesses 
through evolution (also shown in Figure pi). That is, information about an 
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organism's fitness relative to others in its population could in principle act 
as feedback, allowing an individual to set its mutation rate in a good ap- 
proximation to what would be optimal if absolute fitness were known. These 
latter two mechanisms raise the intriguing possibility that population-level 
or social effects could be important in determining individual mutation rates. 
Testing which, if any, of these processes actually occurs in biology will give 
important insights into evolutionary mechanisms. 

We have focused on fitness-associated control of mutation rate. However, 
mutation is only one evolutionary process where fitness-associated control 
may be beneficial. Recombination and dispersal are also evolutionary pro- 
cesses that may be under the control of the individual and therefore open 
to similar effects. Fitness-associated recombination has been demonstrated 
to be advantageous theoretically (57J [58] and identified in biology [59j 160] . 
Similarly, the idea that dispersal associated with low fitness might be ad- 
vantageous has a basis in simulation of spatially differentiated populations 
pxTl E2] ■ This association might perhaps be framed more generally in terms 
of 'fitness associated dispersal'. Thus the framework for control of mutation 
rate in response to fitness that we have developed here may in future be 
applicable to both recombination and dispersal. 

To conclude, our development of theory and testing its predictions in 
silico not only clarifies ideas around the monotonicity of fitness landscapes 
and mutation rate control, it leads directly to questions testable in living 
organisms. At the same time there is the potential for greater insight through 
further development of the theory. Three directions seem particularly likely 
to be fruitful. 

First, while it is striking how effective mutation rate control is for adap- 
tive evolution without invoking selection in our in silico experiments, it will 
be important to consider the role of a selection strategies. Such strategies 
may implicitly modify fitness functions. For instance, one of the analyti- 
cally derived functions shown in Figure [3] is the mutation rate function for a 
DNA space ("H^ ) which maximises the probability of adaptation (as derived 
by Back for binary sequences |24J). As outlined in the corresponding sec- 
tion, maximising the probability of adaptation is equivalent to maximising 
expected fitness of the offspring relative to its parent. This effect may be 
implicit in a selection strategy that removes the offspring of reduced fitness 
that will inevitably be produced by maximising offspring expected fitness. 
Given the importance of selection in biology, we therefore anticipate that 
such functions may be closer to putative mutation rate control functions in 
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living organisms. This requires further work. 

A second area for development is in variable adaptive landscapes. The 
importance of time- varying adaptive landscapes in biological evolution is be- 
coming increasingly appreciated [63| IM] and variable mutation rates have a 
particular role here [U5]- It is worth noticing however that our derivation of 
optimal mutation rate functions is not dependent on a fixed landscape, as it 
depends only on the fitness values. Nonetheless, as we demonstrate for the 
transcription factor landscapes, variation in landscapes' monotonic proper- 
ties relates to the shape of mutation rate functions in predictable ways (Fig- 
ure [9]). This deserves further exploration both theoretically and empirically: 
measuring variation in the monotonic properties of real biological landscapes 
will be informative about optimal mutation rate functions and vice versa. 

Finally, there is potential to develop theory around the role of encoding- 
decoding schemes. Landscape monotonicity, as explored here, is not absolute; 
it depends on the encoding- decoding scheme (see Figure [6]). That is, if the 
encoding changes, it may be possible to convert a non-monotonic landscape 
into a monotonic one. Biology uses a variety of such encoding schemes which 
may themselves evolve. For the transcription factor landscapes used here, the 
encoding-decoding scheme is defined by the biochemical interactions between 
the transcription factor (a protein molecule) and DNA. Thus, evolution of 
transcription factors constitutes evolution of the encoding-decoding scheme, 
and indeed we do find a relationship between that evolution (age of families) 
and the monotonic properties of the associated landscapes. A more familiar 
example of a biological encoding-decoding scheme is the genetic code where 
there is much existing work on its evolution (e.g. [66]). Determining how evo- 
lution of such codes affects the monotonic properties of biological landscapes 
as explored here may therefore provide novel insights into large-scale evolu- 
tionary patterns. Ultimately, theory such as this that identifies analytically 
or empirically optimal mutation rate control functions may help make pre- 
dictions about evolutionary responses to future environmental change [67] 
or inferences about the environment (s) within which particular organisms 
evolved. In the meantime mutation rate control as developed here will as- 
sist directed evolution within biological and other complex landscapes, for 
instance in the evolution of DNA-protein binding [68J . 
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Appendix A. Memoryless Communication 



Let (X, X) and (Y, y) be measurable sets. We shall now consider an 
X x F-valued stochastic process {(x t , Vt)}t>o an d the 'similarity' between 
the marginal processes {x t }t>o an d {yt}t>o under special assumptions on the 
communication between X and Y . Recall that a Markov transition kernel 
from (X, X) to (Y, y) is a conditional probability measure P(Yi \ x) on (Y, y), 
which is Af-measurable for each Yi e y. We shall often use measure-theoretic 
notation dP(y \ x) for transition kernel P{Y% I x )-> especially when it appears 
under the integral. 

Proposition 1. Let (X, X) and (Y, y) be measurable sets, and let {(x t , yt)}t>o 
be a X x Y -valued stochastic process such that elements of the marginal pro- 
cess {yt}t>o are conditionally independent given corresponding elements of 
{xt}t>o: 

dP(y t , . . . , y | x t , . . . , x ) = dP(y t \ x t ) x • • • x dP(y | x ) 

Then transition kernel dP(yt+i \ yt) can be expressed as a composition of 
transition kernels dP(x t \ yt), dP(x t +i \ x t ) and dP{y t+ \ \ x t +i) as follows: 

dP{y t +i \yt)= J J dP(y t+1 | x t+ i) dP(x t+1 \ x t ) dP(x t \ y t ) 

x t+1 ex x t ex 

This transition kernel has the following properties: 

1. If X and Y are statistically independent, then y t +i G F is independent 
ofy t E Y: dP(y t+1 \ y t ) = dP(y t+1 ) 

2. // dP(x | y) corresponds to a function x = h(y) and y are uniformly 
distributed in the preimage h^ 1 (x), then 

dP(y t +i | yt) = 77— i — T7 7] dP(x t+ i = h(y t+ i) | x t = h(y t )) 

3. // dP(y | x) corresponds to a function y = g(x) and x are uniformly 
distributed in the preimage g^iy), then 



dP(y t +i | yt) = |g-i( yt )| J J dP(x t+ i | x t ) 
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4. If dP(y | x) corresponds to a bijection y = h{x), then 

dP(y t +i I Vt) = dP(x t+1 = h(y t+1 ) | x t = h{y t )) 

Proof. Transition kernel dP(x t +i \ x t ) can generally be expressed as follows: 

dP(y t+1 | yt) = / J dP(y M , Wt \y t ) 
x t+1 ex x t ex 

= J J dP(y t+1 | x t+1 ,x t ,y t )dP(x t+1 \ x t ,y t )dP(x t \ y t ) 



x t+1 ex x t &x 



Using the Bayes formula and conditional independence dP(yt+i, yt \ Xt+i,xt) = 
dP(y t +i | x t+ i)dP(y t \ x t ) one can show that dP(y t+1 \ x t+1 ,x t ,y t ) = 
dP(y t+1 | x t+1 ) and dP(x t+1 \ x t ,y t ) = dP(x t+1 \ x t ). Indeed 



dP(y t+1 | x t+1 ,x t ,y t ) 



dP(y t +i,yt 


x t +i,x t ) 


J dP(y t+1 ,y t \ 


x t+li Xt) 



dP(y t +i 


x t 


+i)dP(y t 


x t ) 


I dP(y t+1 


x t+ i)dP(y t 


x t ) 



dP(y t+1 | x t+1 ) 



dP{x t+l | x t ,y t ) = J dP(y t+1 ,x t+1 \ x t ,y t ) 



yt+ieY 



I 

yt+ieY 

I 



dP(y t +i,yt 


x t+ i,x t ) dP(x t+ i 


x t ) 


dP(y t 


Xt) 



dP(y t +i 


x t+1 ) dP(y t 


| x t ) dP(x t+ i 


x t ) 


dP(y t | 


x t ) 



yt+ieY 
= dP(x t+ i | x t ) 

Thus, dP(y t+ i \ yt) can be expressed using the composition of transition 
kernels dP{y t+ i \ x t+ i) dP(x t+ \ \ x t )dP(x t \ yt)- We now consider four 
important cases. 
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1. If X and Y are independent, then dP(y t +i \ x t +i) = dP(y t +i) and 
dP(xt | yt) = dP(xt), and therefore 



dP(y t+1 | y t ) = dP(y t+1 ) J J dP(x t+1 \ x t ) dP(x t ) = dP(y t+1 ) 

2. If x = h(y) and y are uniformly distributed in the preimage h~ l (x), 
then 

dP(x t | y t ) = 5 Hyt) {x t ) , dP(y t+1 \ x t+l ) = — — — -r 

\h 1 o h(y t+1 )\ 

which gives the resulting expression. 

3. If y — g(x) and x are uniformly distributed in the preimage g~ 1 {y)- l 
then 

dP(x t | y t ) = | g- ' dP(y t+ i \ x t+ x) = 6 g{xt+l) (y t+1 ) 

The resulting expression is obtained by integrating dP(xt+i \ Xt) for 
each x t+ i G g' l (y t+ i) and x t G g^iyt)- 

4. Follows trivially from the fact that o h(y)\ = 1 for a bijection. 

□ 

Remark. It is not required in Proposition [T] for any of the stochastic processes 
{(x t , yt)}t>o, {xt}t>o or {yt}t>o to be Markov. It is well-known, however, that 
if {x t }t>o is Markov (i.e. dP(x t+ i \ x t ,...,x ) = dP(x t+1 \ x t )) and y t are 
conditionally independent given the corresponding x t , then the combined 
process {(xt, yt)}t>o is Markov as well, because in this case dP(x t +i, yt+i \ 
x t ,y t , ■ ■ ■ ,x ,yo) = dP(y t+ i \ x t +i) dP(x t +i \ x t ). The unobserved process 
{xt}t>o is often referred to as a hidden Markov model, and x t is estimated 
from observed values y , . . . ,y t of the related process {yt}t>o (this is called 
the filtering problem [HE])- Note that the observed process {yt}t>o is usually 
non-Markov (i.e. dP(y t+1 \ y t ,...,y ) ^ dP(y t+ i \ yt))- In the context of 
Section |4j the unobserved variable x G X is distance to optimum d(T, u), 
and observed variable y G Y is fitness. 
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Appendix B. Monotonicity 



Proposition 2. Let (fl, d) be a metric space, and let f : Q — >■ K 6e a function 
with f(T) = sup /(w) /or some T e fi. // £/ie metric d is monotonic at T 
relative to f , then all oo with the same values f(oo) have the same distance 
d(T, u>) from the optimum. Conversely, if f is monotonic at T relative to d, 
then all u with the same distance d(T,u) from the optimum have the same 
values f(u)). 

Proof. Indeed, using the definition of monotonic d: 

f(a) = f[b) <=► f(a) < f\b) A f(a) > f(b) 

-d(T,a) < -d(T,b) A -d(T, a) > -d(T, b) 
<^ d(T,a)=d(T,b) 

Using the definition of monotonic /: 

d(T,a) = d(T,b) —d(T,a) < —d(T,b) A —d(T,a) > —d(T,b) 

=► /(a) < f(b) A /(a) > f(b) 
<=► f(a)=f{b) 

□ 



Appendix C. Point Mutation as Optimal Solution of Variational 
Problem 

Let (fi, rf) be a metric space, dQia G f]) be a probability measure of the 
'parent' points, and let rfP(6 G f2) be a probability measure of their 'offspring' 
points obtained by a stochastic transformation defined by the transition ker- 
nel dPib | a). The product oLP(& | a) dQia) defines a joint probability mea- 
sure of parents and their offspring. The expected distance between parents 
and offspring is 

E{dia,b)}= J dia,b)dPib\a)dQia) 

Qxf2 

The mutual information between parents and offspring is defined as 

dPib | a) dQia) 

SlxSi 



I{a, b} 



In 



dPib | a) 
dPib) 
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We remind that I{a, b} > with zero if and only if a and b are statistically 
independent. The supremum of I{a, b} corresponds to the case when b is 
obtained from a deterministically using some injective function on Q (i.e. 
a one-to-one mapping). For example, if b is identical to a (i.e. dP{b \ a) 
corresponds to the identity mapping on fl), then I{a, b} = sup I{a, b} = \Q\ 
and d(a, b) = 0. Consider the following variational problem 

minimise E,{d(a,b)} subject to I{a,b} < A (C.l) 

where optimisation is over all joint probability measures dP{b \ a)dQ(a) 
or over all transition probabilities dP(b | a), if dQ(a) is fixed. Because of 
the constraint on mutual information, the transition probabilities dP(b \ a) 
cannot correspond to any injective function on Q, and therefore generally b 



cannot be identical to a so that E{<i(a, b)} > 0. Note that problem (C.l ) has 
the following 'inverse' problem: 

minimise I{a, b} subject to K{d(a, b)} < v (C.2) 

The constraint on the expected distance implies that a and b are not inde- 
pendent so that I{a, b} > 0. It is well-known in information theory (e.g. 
see j69j [70] or [71J for generalisations) that solutions to these variational 
problems are members of an exponential family 

dP g (b | o) = e-M'M-W*) dP(b) , e*<-> = / e—> d P( b) 

B 

where parameter (3 (called the inverse temperature) is defined from one of 
the conditions: 

I{a,b} = \, K{d(a,b)}=v 

Moreover, if the metric space Q is also a group (Q, +) with invariant measure 
z/, and the metric is translation invariant d(a, b) = d(a + c, b + c), then these 
exponential transition kernels have the following simplified form 

dP p {b | a) = e - W)-*o(« dv{h) ; e *o(/3) = f e -pd(a,b) dv{h) 



B 



In particular, this is the case when Q is a normed vector space, and the metric 
is defined using the difference of two vectors: d(a, b) = \\a — b\\. For example, 
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the Hamming space T-L a := {1, . . . , a} is a finite vector space over a finite 
field GF(a) with the Hamming metric defined as dn{a, b) = \\a — b\\n, where 
|| ■ 1 1 ^ is the Hamming weight. The invariant measure on a Hamming space 
is the counting measure u(b) = 1. Thus, for a Hamming space the optimal 
transition kernel solving problems (C.l) and ( |C.2[ ) is 



e -P\\a-b\\ H 



We now show that the above exponential transition kernel implements point 
mutation. 

Indeed, because e -13 ^^ 11 = e~ l3r for all sequences in the sphere S(a, r) := 
{b : \\a — b\\n — r} around point a and radius r, the summation of e' 13 ^^ 11 
over all sequences b £ % l a can be replaced by the summation of \S(a, r)|e _/3r 
over the spheres of all radii r £ {0, . . . ,1}. The number of sequences in a 
sphere of the Hamming space % l a is \S(a, r)| = (a — l) r ('), and therefore 



e *o(0 = J2 e-P^* = J2(a - l) r ( l \-^ = [1 + (a - l)e~^ 



beH l a r=0 

Thus, Pp(b | a) has the following simple expression: 

„-/3 ||o-6||_h- 

Pp(b | a) 



[1 + (a - l)e- 



1/ 



Given a sequence that is n = ||T — a\\jj letters away from T, the probability 
of mutation by radius r = \\a — b\\n is: 



Pp{r\n) = \S(a,r)\P p {b\a) = {a 



r) [1 + (a- l)e-P} 1 



The inverse temperature parameter /3 is determined either from condition 
I{a, b} = A or E{||a — b\\n} = v. In particular, it is convenient to use the 
latter condition in conjunction with the following expression for the expected 
mutation radius 

E M = 4^o(/3) ' 



dp ^ J \ + eP/(a-\) 
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Inverting the equation E{r}(/3) = v gives the result 



(3 = In 0~^-\ +hi(a-l) 

Changing parametrisation from to v, the probability Pp(r \ n) can be 
written as binomial distribution with probability of success \x = v/l: 



Therefore, exponential transition kernel that solves optimisation problems (C.l ) 
and (C.2) in the Hamming space corresponds to independent substitution of 
each letter in a sequence to any other of the a — 1 letters with probability 
— 1), and this process is known as point mutation. 
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Figure D.10: Optimal mutation rate control functions evolved by a Meta-GA on the 
transcription factor DNA-binding landscapes from [30]. Ordinates show mutation rates, 
and abscissae show the binding scores. Each panel corresponds to a different transcription 
factor. Lines connect the average mutation rates obtained in 16 independent trials on a 
particular landscape. Errorbars represent standard deviations from the mean. The GAs 
do not spend much time at low binding scores meaning that the results become more 
random. 
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1 1 1 r 

2 4 6 8 
6 Bbx_3753.1 TGAATGAA 




i 1 1 r 

2 4 6 8 
11 E2F2_1 022.4 GGCGCCAA 




\ 1 I 1 r 

2 4 6 8 
16 Ehf_3056.2 ACTTCCGG 




1 1 I 1 r 

2 4 6 8 
21 Foxj1_3125.2 CAAACAAA 




1 i i i r 

2 4 6 8 
26 Gata3_1 024.3 AGATAAGA 




\ — i — i — i — r 

2 4 6 8 
31 Glis2_1757.2 GACCCCCC 




1 1 1 1 r 

2 4 6 8 
36 Hnf4a_2640.2 GGGGTCAA 





1 1 1 r 

2 4 6 8 
7 Bcl6b_0961.2 TCTAGGAA 




i — i — i — i — r 

2 4 6 8 
12 E2F3_3752.1 GCGCGCGC 




i — i — i — i — r 

2 4 6 8 
17 Elf3_3876.1 ACTTCCGG 




1 1 I 1 r 

2 4 6 8 
22 Foxj3_0982.2 ATAAACAA 




1 i i i r 

2 4 6 8 
27 Gata3_4964.2 AGATAAGA 




\ 1 I 1 r 

2 4 6 8 
32 Gm397_1 753.4 TGTGCACA 




1 1 I 1 r 

2 4 6 8 
37 Hoxa3_2783.2 TTAATTAA 





i 1 I 1 r 

2 4 6 8 
8 Bhlhb2_1 274.3 TCACGTGA 




i — i — i — i — r 

2 4 6 8 
13 E2F3_3752.2 GCGCGCGC 




i — i — i — i — r 

2 4 6 8 
18 Eomes_0921.4 GGTGTGAA 




T 1 I 1 r 

2 4 6 8 
23 Foxk1_2323.4 GTAAACAA 




i i i i r 

2 4 6 8 
28 Gata5_3768.1 CTGATAAG 




i 1 1 r 

2 4 6 8 
33 Gmeb1_1745.2 TTACGTAA 




i 1 I 1 r 

2 4 6 8 
38 IRC900814_3520.1 ACGACAAA 



i 1 1 1 r 

2 4 6 8 




i 1 1 T 

2 4 6 8 
9 Bhlhb2_4971.1 TCACGTGA 




\ — i — \ — i — r 

2 4 6 8 
14 Egr1_2580.1 GCCCCCGC 



l 1 1 1 r 

2 4 6 8 
19 Esrra_2190.2 CAAGGTCA 




i 1 \ 1 r 

2 4 6 8 
24 Foxl1_2809.2 ATAAACAA 




t i \ i r 

2 4 6 8 
29 Gata6_3769.1 AGATAAGA 




i — i — \ — i — r 

2 4 6 8 
34 Hbp1_2241 .3 TGAATGAA 




i 1 \ 1 r 

2 4 6 8 
39 Irf3 3985.1 GGAAACTA 





i 1 1 r 

2 4 6 8 
10 E2F2_1 022.2 GCGCGCCA 




\ 1 1 r 

2 4 6 8 
15 Egr1_2580.2 GCCCCCGC 




i 1 r 

2 4 6 8 
20 Foxa2_2830.2 GTAAACAA 




i 1 r 

2 4 6 8 
25 Gabpa_2829.2 ACCGGAAG 




i i r 

2 4 6 8 
30 Gcm1_3732.1 ACCCGCAT 




i 1 1 r 

2 4 6 8 
35 Hic1_2816.2 GTGCCAAC 




i 1 r 

2 4 6 8 
40 Irf4_3476.1 ACCGAAAC 




41 Irf5 3874.1 ACCGAAAC 



42 Irf6_3803.1 ACCGAAAC 



43 Isgf3g_2853.2 ATCGAAAC 



44 Jundm2_091 1 .3 TGACGTCA 



45 Klf7_0974.2 CCCCGCCC 




1 1 1 r 

2 4 6 8 
46 Lef1_3504.1 CTTTGATC 




i — i — i — i — r 

2 4 6 8 
51 Mtf1_2377.2 CGTGTGCA 




i — i — i — i — r 

2 4 6 8 
56 Nr2f2_2192.2 AAAGGTCA 




1 1 I 1 r 

2 4 6 8 
61 Rfx3_3961.1 CGTTGCTA 




1 i i i r 

2 4 6 8 
66 SfpM 1 034.2 ACTTCCTC 




\ — i — i — i — r 

2 4 6 8 
71 Sox11_2266.2 AACAAAGG 




1 1 I 1 r 

2 4 6 8 
76 Sox17_2837.2 AATTGTTC 





1 1 I 1 r 

2 4 6 8 
47 Mafb_2914.2 GTCAGCAA 




i — i — i — i — T 

2 4 6 8 
52 Myb_1 047.3 AACCGTTA 




i — i — i — i — T 

2 4 6 8 
57 Osr1_3033.2 ACAGTAGC 




1 1 I 1 r 

2 4 6 8 
62 Rfx3_4970.2 GGTTGCTA 




1 i i r 

2 4 6 8 
67 Sfpi1_1034.3 ACTTCCTC 




\ 1 1 r 

2 4 6 8 
72 Sox12_3957.1 GAACAATA 




1 1 I 1 r 

2 4 6 8 
77 Sox18_3506.1 AATTGTTC 



V 



i 1 1 1 r 

2 4 6 8 




i 1 1 T 

2 4 6 8 
48 Mafk_3106.2 TCAGCAAA 




i — i — i — i — r 

2 4 6 8 
53 Mybl1_1717.2 AACCGTTA 




i — i — i — i — r 

2 4 6 8 
58 Osr2_1 727.2 ACAGTAGC 




T 1 I 1 r 

2 4 6 8 
63 Rfx4_3761.1 CGTTGCTA 




i i i r 

2 4 6 8 
68 Six6_2267.4 GGGTATCA 




i 1 1 r 

2 4 6 8 
73 Sox13_1718.2 GAACAATA 




T 1 I 1 r 

2 4 6 8 
78 Sox21_3417.1 ATTATAAT 





T 1 \ 1 r 

2 4 6 8 
49 Max_3863.1 CCACGTGG 




\ — i — \ — i — r 

2 4 6 8 
54 Myf6_3824.2 CAGCTGTC 




i — i — \ — i — r 

2 4 6 8 
59 Plagl1_0972.2 GGGGCCCC 




T 1 \ 1 r 

2 4 6 8 
64 Rfxdc2_3516.1 GGTTGCTA 




1 i \ i r 

2 4 6 8 
69 Smad3_3805.1 CCAGACAG 




i — i — \ — i — r 

2 4 6 8 
74 Sox14_2677.2 ATTATAAT 




T 1 \ 1 r 

2 4 6 8 
79 Sox30_2781 .2 ATTGTTAA 





i 1 1 r 

2 4 6 8 
50 Max_3864.1 CCACGTGC 




i — i — \ — i — r 

2 4 6 8 
55 Nkx3.1_2923.2 CCACTTAA 




i 1 r 

2 4 6 8 
60 Rara_1051.2 AAAGGTCA 




i 1 1 r 

2 4 6 8 
65 Rxra_1 035.2 TGACCCCA 




i i i r 

2 4 6 8 
70 Sox1_2631 .2 AATTCAAT 




i 1 1 r 

2 4 6 8 
75 Sox15_3457.1 GAACAATA 




i 1 r 

2 4 6 8 
80 Sox4__2941.2 AACAAAGG 



At 

1 ^ 



i 1 1 1 r 

2 4 6 8 



81 Sox5_3459.1 GAACAATA 



82 Sox7_3460.1 AATTGTTC 



83 Sox7_4972.2 AACAATAG 



84 Sox8_1 733.2 AATTGTTC 



85 Sp100_2947.2 AATTTCCG 




1 1 1 1 r 

2 4 6 8 
86 Sp4_1011.2 CCGCCCCC 




i 1 1 T 

2 4 6 8 
91 Tcf1_2666.2 TTAACTAA 




\ — i — i — i — r 

2 4 6 8 
96 Tcfap2a_2337.3 CCTGAGGC 




1 1 I 1 r 

2 4 6 8 
101 Zbtb12_2932.2 GTTCTAGA 




i i i i r 

2 4 6 8 
106 Zfp161_2858.2 GCGCGCGC 






1 1 I 1 r 

2 4 6 8 
87 Spdef_0905.2 ATCCGGAT 




i 1 1 r 

2 4 6 8 
92 Tcf1_2666.3 GTTAACTA 




\ — i — i — i — r 

2 4 6 8 
97 Tcfap2b_3988.1 CCTGAGGC 




1 1 I 1 r 

2 4 6 8 
102 Zbtb3_1 048.2 ATGCAGTG 




1 i i i r 

2 4 6 8 
107 Zfp187_2626.2 ATTAGTAC 




\ 1 1 r 

2 4 6 8 
1 1 2 Zic1_0991 .2 G ACCCCCC 





T 1 I 1 r 

2 4 6 8 
88 Srf_3509.1 AAAAAAAA 




\ 1 1 r 

2 4 6 8 
93 Tcf3_3787.1 CTTTGATC 




\ 1 I 1 r 

2 4 6 8 
98 Tcfap2c_2912.2 CCTGAGGC 




T 1 I 1 r 

2 4 6 8 
103 Zbtb7b_1054.2 AGCCCCCC 




i i i r 

2 4 6 8 
108 Zfp281_0973.2 CCTCCCCC 




i 1 r 

2 4 6 8 
113 Zic2_2895.2 G ACCCCCC 





T 1 \ 1 r 

2 4 6 8 
89 Sry_2833.2 ATTATAAT 




\ 1 1 r 

2 4 6 8 
94 Tcf7_0950.2 CTTTGATC 




\ — i — \ — i — r 

2 4 6 8 
!9Tcfap2e_3713.1 CCTCAGGC 




T 1 \ 1 r 

2 4 6 8 
104 Zfp105_2634.2 ACAAATAA 




i i i r 

2 4 6 8 
109 Zfp410_3034.2 ATGGGATG 




\ 1 1 r 

2 4 6 8 
114Zic3_3119.2CCCCCCGC 





T 1 \ 1 r 

2 4 6 8 
90 Tbp_pr781.1 TATATATA 




i — i — \ — i — r 

2 4 6 8 
95Tcf7l2_3461.1 CTTTGATC 




i 1 1 r 

2 4 6 8 
100Tcfe2a_3865.1 ACAGGTGC 




i 1 r 

2 4 6 8 
105 Zfp128_2806.2 GGCGTACC 




i i \ i r 

2 4 6 8 
110 Zfp691_0895.2 CAGTGCTC 




i 1 1 r 

2 4 6 8 
115 Zscan4_2667.2 TGTGCACA 




Figure D.ll: Landscapes of binding score between 8 base-pair DNA sequences and tran- 
scription factors (TF) from |30j . Ordinates show binding scores, and abscissae show Ham- 
ming distances from the top sequence (a sequence with the highest DNA-TF binding 
score). Each panel corresponds to a different transcription factor. Lines connect mean 
values of the binding score for each value of the Hamming distance from the top sequence. 
Errorbars represent standard deviations. Note that this dataset does not distinguish be- 
tween sequences on opposite strands of the DNA. Therefore, a sequence and its reverse 
complement are shown only once and the Hamming distance shown is either to the top 
sequence or its reverse complement, whichever is the closer. 
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